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Abstract 


The  research  was  a  probabilistic  study  of  neural  network  models.  It  was  not 
oriented  toward  the  workings  of  a  particular  device,  but  was  intended  to  pro¬ 
vide  an  understanding  of  the  basic  mechanisms  of  learning  and  recognition 
in  neural  networks.  The  main  areas  of  progress  were  analysis  of  neural  net¬ 
work  models,  study  of  network  connectivity,  and  investigation  of  computer 
network  theory. 
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1  Introduction 


The  research  dealt  with  probabilistic  analysis  of  neural  networks.  The  prin¬ 
cipal  investigators  were  William  G.  Faris  and  Charles  M.  Newman.  Robert 
S.  Maier  also  participated  in  the  project. 

The  analysis  was  intended  to  provide  an  understanding  of  the  basic  mech¬ 
anisms  of  the  elementary  components  of  neural  network  recognition  and 
memory  devices.  The  research  was  less  concerned  with  building  practical 
devices  than  with  a  mathematical  analysis  of  the  basic  phenomena.  However 
the  analyses  will  eventually  be  useful  in  understanding  the  components  of 
much  more  complicated  realistic  devices. 

There  were  three  main  components  of  research: 

•  Neural  Network  Models 

•  Network  Connectivity 

•  Computer  Networks 

These  components  are  explained  in  the  following  sections. 

2  Neural  Network  Models 

2.1  Reliability  of  neural  networks 

The  concept  of  a  set  of  associated  random  variables  provides  a  nonlinear  gen¬ 
eralization  of  the  notion  of  a  set  of  positively  correlated  random  variables. 
This  concept  is  useful  in  analyzing  the  reliability  of  neural  network  associa¬ 
tive  memory  devices.  In  particular,  it  allows  estimates  of  the  probability  of 
successful  retrieval  of  individual  bits  in  a  memorized  pattern  to  be  extended 
to  a  lower  bound  on  the  probability  of  error-free  retrieval  of  the  pattern  as  a 
whole  [1]. 

There  are  a  number  of  standard  neural  network  models  of  memory  in 
which  associations  are  stored  in  connections  between  nodes.  In  order  to 
analyze  the  problem  of  when  such  a  memory  device  can  function  reliably 
and  when  it  becomes  overloaded,  it  is  useful  to  treat  the  memorized  items  as 
random.  However  when  this  is  done,  the  presence  or  absence  of  the  various 
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possible  connections  are  not  independent  random  events.  A  rigorous  analysis 
must  use  some  concept  that  goes  beyond  independence. 

The  notion  of  associated  random  variables  is  one  such  tool.  A  set  of 
associated  random  variables  has  the  property  that  every  pair  of  variables  is 
positively  correlated.  However  the  notion  is  stronger,  since  the  same  property 
also  holds  for  certain  nonlinear  functions  of  all  the  variables.  Many  of  the 
properties  of  independent  random  variables  hold  for  associated  random  vari¬ 
ables,  notably  the  central  limit  theorem.  (Another  terminology  has  arisen  in 
statistical  mechanics;  associated  random  variables  are  said  to  obey  the  FKG 
inequalities.) 

The  model  is  an  m-by-n  matrix  of  connections  in  which  pairs  of  patterns 
are  stored,  and  from  which  an  output  pattern  may  be  retrieved  upon  presen¬ 
tation  of  the  corresponding  input  pattern.  Each  input  pattern  is  a  random 
pattern  of  n  0’s  and  l’s,  the  probability  of  a  1  being  p.  Similarly,  each  output 
pattern  is  a  random  pattern  of  m  0’s  and  l’s,  the  probability  of  a  1  being 
q.  The  entries  in  the  matrix  are  also  0’s  and  l’s,  and  are  initially  all  0’s. 
Whenever  a  pattern  pair  is  stored,  matrix  elements  are  set  to  1  (t'.e.,  are  “ac¬ 
tivated”)  if  there  are  l’s  in  the  corresponding  input  and  output  lines  (t.e., 
the  lines  are  “active”).  In  all  there  are  z  input-output  pairs  to  be  stored. 

The  retrieval  process  is  governed  by  a  threshold  parameter  A.  If  a  previ¬ 
ously  stored  input  pattern  is  presented  to  the  matrix,  the  retrieved  output 
pattern  will  by  definition  have  l’s  on  the  output  lines  that  are  connected 
by  the  matrix  to  a  number  of  active  input  lines  that  exceeds  the  threshold. 
Under  restrictions  on  the  parameters  p,  q,  and  z  that  ensure  that  the  mean 
density  of  connections  is  less  than  unity,  the  threshold  A  may  be  chosen 
so  that  one  gets  a  good  lower  bound  for  the  probability  that  the  retrieved 
pattern  equals  the  originally  stored  pattern.  That  is  to  say,  the  model  pa¬ 
rameters  can  be  chosen  so  that  there  is  minimal  interference  between  the 
stored  pairs  of  patterns. 

This  model  may  have  biological  interest  as  a  model  for  associative  mem¬ 
ory,  for  instance  in  the  hippocampus.  Similar  robust  models  may  be  impor¬ 
tant  for  neural  computing. 

2.2  Evaluation  of  neural  networks 

The  question  considered  is  that  of  reliable  evaluation  of  a  neural  network. 
We  investigate  when  the  behavior  of  a  neural  network  in  a  limited  number 
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of  random  trials  gives  average  results  that  are  representative  of  the  average 
results  for  infinitely  many  trials  [2]. 

A  feed-forward  neural  net  architecture  defines  a  class  of  functions  Fg  (say 
from  Hd  to  R)  for  9  in  some  parameter  set  (of  weights).  We  consider  n  ran¬ 
dom  inputs  X,,  for  i  =  1, . . .  ,n,  taken  independently  from  some  probability 
distribution.  The  weak  law  of  large  numbers  gives  for  each  9  a  bound  on  the 
probability  of  the  event  that  the  sample  average  differs  from  the  expectation 
by  e  >  0. 

The  research  gives  a  bound  on  the  probability  that  there  exists  a  9  such 
that  this  happens.  This  is  a  uniform  large  deviation  result  that  is  consider¬ 
ably  more  subtle. 

One  situation  where  this  is  important  is  when  the  parameter  values  0 
are  random,  perhaps  depending  on  the  Xi,...,Xn.  In  that  case  the  long 
range  expectation  should  be  computed  by  taking  an  independent  copy  X  of 
X  and  computing  the  conditional  expectation  of  Fq(X)  given  Xi,...,Xn. 
The  preceding  result  then  gives  the  same  bound  on  the  probability  that  the 
sample  average  with  the  random  parameter  differs  from  the  expectation  by 
e.  Thus  if  a  certain  laboriously  acquired  sample  is  used  for  the  learning  trials 
that  define  the  network,  then  the  same  sample  may  be  used  for  later  trials 
that  evaluate  its  performance. 

Such  a  measurement  of  performance  is  relevant  to  classification  problems 
in  which  there  is  more  than  one  probability  distribution.  The  network  is 
intended  to  distinguish  between  these  distributions.  It  performs  well  in  the 
long  run  if  the  expectations  for  the  distributions  are  far  apart.  The  result 
says  that  taking  moderate  size  samples  from  each  of  the  distributions  gives 
a  good  idea  of  the  corresponding  expectations. 

The  application  is  to  a  feed-forward  neural  network  specifying  such  a 
function.  In  the  usual  terminology  of  neural  networks  there  is  an  input  layer, 
a  hidden  layer,  and  an  output  layer.  Associated  with  the  layers  are  d  input 
nodes,  N  hidden  nodes,  and  one  output  node.  The  same  non-linear  threshold 
function  <f>  from  R  to  R  is  associated  with  each  hidden  node.  It  is  customary 
to  take  it  to  be  continuous  and  increasing  from  0  to  l.The  threshold  function 
is  fixed  in  advance.  One  conventional  choice  is  4>{u)  =  1/(1  -f  e-u).  Often 
there  is  also  a  threshold  function  associated  with  output  nodes.  Since  this 
merely  amounts  to  a  change  of  variable,  we  omit  this  complication. 

The  network  is  specified  during  the  learning  process  by  assigning  connec¬ 
tion  weights.  For  each  hidden  layer  node  i  there  is  a  scalar  a,  representing  a 
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bias  weight  and  a  vector  b,  in  corresponding  to  the  weights  coming  from 
the  input  nodes.  Also  for  each  hidden  layer  node  t  there  is  a  scalar  C{  rep¬ 
resenting  the  weight  going  to  the  output  node.  We  specify  the  network  in  a 
compact  way  by  giving  the  vector  a  of  biases,  the  matrix  B  of  input  weights, 
and  the  vector  c  of  output  weights.  (The  output  weights  are  bounded  by  a 
fixed  constant.)  These  quantities  constitute  the  parameter  space. 

The  reason  for  this  choice  of  architecture  is  the  observation  that  it  is 
sufficient  for  synthesizing  arbitrary  functions.  That  is,  a  network  of  this 
type  with  sufficiently  many  intermediate  nodes  is  able  to  approximate  most 
functions  rather  well. 

The  uniform  large  deviation  result  applies  to  this  sort  of  neural  network. 
The  probability  estimate  involves  a  certain  polynomial  pd  of  degree  d.  (For 
example  p\(m)  —  2 in  and  P2(m)  =  m2  —  m  -f  2.)  The  statement  is  about  a 
sample  Xi , . . . ,  X„  of  independent  and  identically  distributed  random  vectors 
in  Rd.  It  says  that  for  every  e  >  0  there  exists  t'  >  0  such  that  for  large  n 
the  probability  that  there  exists  a,  B,  c  such  that  the  sample  average  differs 
from  the  expectation  by  more  than  e  is  bounded  by  4pd(2n)  exp(— (e')2n/8). 
The  important  point  is  that  as  n  — ►  oo  the  exponential  goes  to  zero  faster 
than  the  polynomial  goes  to  infinity.  The  probability  of  a  large  deviation  is 
thus  very  small. 

2.3  Neural  network  loss  functions 

The  work  on  reliable  evaluation  was  extended  to  encompass  a  global  measure 
of  loss  [3], 

A  neural  network  is  a  system  that  is  supposed  to  perform  its  function 
by  learning  from  experience  rather  than  being  programmed  by  an  algorithm. 
The  class  of  network  architectures  that  we  consider  is  representative.  In 
considering  this  class  we  fix  a  smooth  non-linear  threshold  function  <j)  from 
R  to  R.  We  next  choose  integers  d  and  N,  corresponding  to  the  number  of 
input  nodes  and  the  number  of  hidden  nodes.  There  is  a  single  output  node. 
These  choices  define  the  architecture. 

The  network  is  then  specified  by  parameters  called  connection  weights. 
These  are  a  vector  a  in  Rd  of  biases,  an  TV  by  d  matrix  B  of  input  weights, 
and  a  vector  c  in  R^  of  output  weights.  We  write  9  =  (a,  B,  c)  for  an  element 
of  the  parameter  space. 

Let  <f>N  be  the  function  from  R^  to  R^  defined  by  pointwise  evaluation. 
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The  network  is  then  the  function  from  Rd  to  R  given  by  F$(x)  =  c  ■  <£‘v(Bx  + 
a).  The  picture  is  that  the  transformation  from  the  input  nodes  to  the  the 
hidden  nodes  is  linear,  but  then  there  is  a  thresholding  operation  at  each 
hidden  node.  The  transformation  to  the  output  node  is  again  linear. 

Assume  that  the  input  to  the  network  is  a  random  vector  X  in  Rrf.  A  the¬ 
oretical  measure  of  the  performance  of  a  particular  network  in  approximating 
a  function  /  is  the  expectation 

L(0)  =  E[(/(X)  -  F*(X))2].  (1) 

When  this  is  small  the  performance  is  good.  Let  Xj, . . . ,  Xn  be  independent 
random  vectors.  The  corresponding  empirical  measure  of  performance  is  the 
random  variable 

£.(»)  =  ^E(/(X»)  -  J=i(x»))!-  (2) 

Tl 

k=  1 

The  weak  law  of  large  numbers  says  that  for  each  6  and  for  each  e  >  0  the 
probability  P[|Xn($)  —  L(9)\  >  e]  — ►  0  as  n  — *  oo.  In  many  situations  there  is 
also  a  large  deviation  bound  that  says  that  the  probability  approaches  zero 
exponentially  as  n  — ►  oo.  Our  purpose  is  to  present  a  large  deviation  bound 
that  is  uniform  in  6.  That  is,  we  show  that 

P[30  |Ln(0)  —  L{9)\  >  e]  — ►  0  (3) 

as  n  — ►  oo  and  that  the  probability  approaches  zero  exponentially  as  n  — *  oo. 
Furthermore  the  bounds  on  the  probability  are  given  explicitly  in  terms  of 
the  network  architecture. 

This  result  says  that  for  a  large  sample  it  is  improbable  that  there  exists 
a  network  for  which  the  empirical  measure  of  performance  is  misleading. 

2.4  A  self-organizing  process 

The  work  on  “Stability  of  a  self-organizing  process”  [4]  treats  the  stability  of 
the  Markov  chain  involved  in  the  self-organizing  feature  maps  of  Kohonen. 
These  maps  are  determined  by  the  effect  of  a  random  environment.  The 
values  of  these  maps  learn  to  imitate  the  environment  while  also  attempting 
to  preserve  the  neighborhood  topology.  We  give  conditions  under  which 
two  initial  states  approach  each  other  exponentially  fast  for  all  time  with 


6 


probability  one.  Thus  the  initial  state  does  not  matter;  the  environment 
determines  future  history. 

One  way  of  constructing  a  Markov  chain  is  as  follows.  Consider  a  state 
space  and  a  sequence  Fi, . . . ,  Fn, ...  of  independent  random  functions  from 
the  state  space  to  itself.  Take  an  initial  point  .V.  The  random  orbit  defined 
by  the  iterations  Xn  =  Fn  •  •  •  F\X  is  the  Markov  chain. 

The  self-organizing  feature  map  is  constructed  in  this  way.  Consider  a 
finite  subset  A  of  the  integer  lattice  Zd  with  t  points.  The  state  space  is  the 
set  of  all  functions  X  from  A  to  R^. 

Fix  a  probability  measure  p  on  and  a  shrinking  parameter  a  with 
0  <  a  <  1.  Also  fix  an  integer  range  parameter  r  >  0.  The  random  functions 
Fn  are  defined  as  follows.  For  each  n  choose  an  independent  point  u  in  R^ 
from  the  probability  distribution  p.  Then  choose  the  i  in  A  that  minimizes 
the  distance  |X(t)  —  u\.  Consider  the  neighborhood  of  i  consisting  of  all  j  in  A 
with  |i  — j\  <  r.  Then  FnX  is  the  new  state  where  FnX(j)  =  au>+(l  —  a)X{j) 
for  all  j  in  this  neighborhood  and  the  other  values  are  unchanged. 

One  usually  takes  A  to  be  t  points  constituting  a  rectangular  subset  of  Zd . 
For  mathematical  investigations  it  is  convenient  to  take  p  to  be  the  uniform 
measure  on  a  product  of  N  intervals. 

The  main  parameters  for  the  feature  map  are  the  integers  d  and  N,  rep¬ 
resenting  the  dimensions  for  domain  and  range,  and  the  range  r  of  the  in¬ 
teraction  in  the  domain  The  number  of  points  i  in  the  domain  is  another 
parameter,  and  finally  there  is  the  shrinking  parameter  a. 

Most  often  d  =  jV;  however  sometimes  it  is  interesting  to  take  more 
general  d  <  N. 

The  case  of  zero  range  r  =  0  is  a  self-organizing  clustering  process.  In  this 
case  A  plays  the  role  of  a  structureless  index  set.  However  in  the  case  r  =  1  of 
nearest-neighbor  interaction  the  process  attempts  to  preserve  topology,  and 
so  might  be  called  topological  clustering. 

One  picturesque  interpartion  of  the  case  d  =  N  =  2  is  when  A  is  thought 
of  as  a  set  of  cells  in  the  cortex  and  the  region  in  R2  is  thought  of  as  the 
retina.  The  map  X  from  the  cortex  to  the  retina  develops  in  such  a  way  that 
the  points  in  the  retina  fall  in  areas  of  the  retina  with  extensive  stimulation, 
while  at  the  same  time  nearby  cells  in  the  cortex  tend  to  be  connected  to 
nearby  points  in  the  retina. 

We  consider  the  stability  question,  formulated  as  follows.  Take  a  >  0. 
Take  two  initial  states  X  and  Y  and  look  at  their  orbits  under  the  same 
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random  sequence  of  functions.  The  question  is  whether  when  the  initial  X 
and  Y  are  sufficiently  close  there  is  a  non-zero  probability  that  they  stay 
close  for  all  future  time. 

If  X  and  Y  are  close,  then  the  i  that  makes  X(i)  closest  to  u  may  also 
be  the  i  that  makes  y'(z)  closest  to  uj.  In  that  case,  X(i)  —  Y(i)  is  replaced 
by  (1  —  a)(A'(i)  —  Y'(i)).  Thus  the  two  points  have  been  shrunk  together. 
The  question  is  whether  this  shrinking  can  persist  for  all  future  time. 

The  ergodic  behavior  has  been  studied  rigorously  in  a  fundamental  paper 
by  Cottrell  and  Fort.  They  treat  the  one-dimensional  case  d  =  1  and  N  =  1 
with  nearest  neighbor  interaction  (range  r  =  1).  One  interesting  feature  of 
this  case  is  that  it  makes  sense  to  say  that  a  map  X  is  increasing  or  decreas¬ 
ing.  Cottrell  and  Fort  prove  that  the  map  is  eventually  either  increasing  or 
decreasing.  Once  it  has  reached  one  or  the  other  status  it  remains  that  way. 
Furthermore  for  maps  of  either  status  they  prove  convergence  to  a  stationary 
distribution.  (The  higher  dimensional  case  has  been  studied  by  Ritter  and 
Schulten.  Much  of  their  work  uses  a  diffusion  approximation.)  We  prove 
stability  for  the  one-dimensional  case  studied  by  Cottrell  and  Fort.  We  also 
obtain  partial  results  on  the  higher  dimensional  case. 

2.5  A  neural  oscillator 

One  component  of  the  research  is  modelling  of  parts  of  the  nervous  system  in 
various  invertebrates  [5].  This  has  involved  the  creation  of  a  computer  model 
that  can  accommodate  biologically  realistic  parameters  and  a  wide  variety 
of  neural  connections.  Members  of  the  experimental  group  in  Neuroscience 
have  been  helpful  in  this  enterprise.  Brian  Smith  and  Tom  Christensen  have 
made  suggestions  about  modelling  parts  of  the  insect  olfactory  system,  and 
Ed  Arbas  has  provided  data  on  the  leech  heartbeat  timing  mechanism.  One 
result  of  the  latter  contact  has  been  research  on  neural  oscillators.  The 
experimental  background  is  contained  in  work  of  Ronald  L.  Calabrese,  James 
D.  Angstadt,  Edmund  A.  Arbas,  “A  neural  oscillator  based  on  reciprocal 
inhibition,”  in  Perspectives  in  Neural  Systems  and  Behavior,  T.  J.  Carew 
and  D.  Kelley  (eds.),  Alan  R.  Liss,  Inc.,  N.Y.,  1989.  The  system  studied  in 
these  experiments  was  the  subject  of  a  preliminary  computer  analysis.  This 
led  to  work  that  gave  a  better  theoretical  understanding  of  the  system. 

In  the  simplest  version,  the  oscillator  consists  of  two  neurons,  numbered 
1  and  2.  One  can  make  the  idealization  that  the  oscillator  is  described  by 
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four  variables,  the  two  membrane  voltages  t>i  and  v2  and  two  conductances 
<7i  and  g2.  The  system  is  described  by  four  differential  equations  for  the 
four  variables.  The  equations  for  the  two  voltages  t\  and  two  excitatory 
conductances  g,  are 


c-^-  =  (£_  -  v,)s(vt.)  +  (E+  -  vt)g, 

and 

dg,  .  . 

TrfT  = 

Here  i  ranges  from  1  to  2  and  labels  one  neuron,  while  i“  =  3  —  i  labels  the 
other  neuron.  The  voltage  equations  are  the  current  conservation  equations 
for  a  circuit  with  two  parallel  channels.  The  parameter  c  >  0  is  the  capaci¬ 
tance.  The  parameters  £_  and  E+  are  voltages  induced  in  the  two  channels 
by  ionic  diffusion.  One  takes  E _  <  E+  and  refers  to  these  as  inhibitory 
and  excitatory  channels.  The  conductance  s(u,.)  in  the  inhibitory  channel 
is  given  by  a  positive  increasing  function  s  of  the  voltage  tv  in  the  other 
neuron.  (Thus  a  high  voltage  in  one  neuron  increases  the  conductance  in 
the  inhibitory  channel  in  the  other  neuron;  the  two  neurons  are  mutually  in¬ 
hibitory.)  The  conductance  in  the  excitatory  channel  is  given  by  the  variable 
gx.  The  equations  for  the  excitatory  conductances  involve  a  time  constant 
r  >  0  that  governs  the  rate  of  their  approach  to  equilibrium.  The  equi¬ 
librium  conductance  r(u,)  is  a  positive  decreasing  function  r  of  the  voltage 
v,  in  the  same  neuron.  One  wishes  to  compare  the  solution  of  the  system 
of  four  differential  equations  with  the  discontinuous  solutions  of  systems  of 
two  differential  equations.  There  are  two  such  systems,  one  defining  “slow 
motion”  (in  voltage  equilibrium)  and  one  defining  a  “fast  motion”  (between 
voltage  equilibria).  One  leaves  the  slow  motion  at  a  “junction  point”  where 
the  motion  cannot  continue  in  voltage  equilibrium.  Then  a  fast  motion  moves 
the  system  very  rapidly  to  a  “drop  point,”  where  the  voltage  equilibrium  is 
resumed. 
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3  Network  Connectivity 

3.1  The  trapping  transition:  communication  block¬ 
age  in  a  random  network 

This  project,  which  was  carried  out  by  graduate  student  M.  Pokorny  in  col¬ 
laboration  with  Newman  and  with  D.  Meiron  of  Caltech,  is  primarily  perco¬ 
lation  theoretic  and  involves  computer  simulations  of  two-dimensional  ran¬ 
domly  connected  bond  networks  [6].  Let  us  describe  the  basic  phenomenon 
under  investigat^n  using  neural  network  terminology. 

Suppose  all  active  synapses  are  short  range  (say  nearest  neighbor),  but 
only  a  fraction  p  of  all  possible  short  range  synapses  are  actually  active.  Sup¬ 
pose  further  that  p  is  above  the  percolation  threshold  pc,  so  that  long  range 
communication  is  possible  through  the  infinite  cluster  of  active  synapses. 
Assume  that  all  active  synapses  in  this  infinite  cluster  (but  none  of  those  in 
finite  clusters)  are  suddenly  destroyed  (e.g.  by  a  massive  “seizure”  or  invasion 
by  a  virus  which  can  only  be  transmitted  through  the  currently  active  synap¬ 
tic  connections).  We  ask  whether  global  communication  can  be  restored  to 
the  system  by  activating  the  currently  inactive  synapses  and  utilizing  them 
together  with  the  currently  active  synapses  (which  belonged  to  finite  clusters 
and  hence  were  not  destroyed  by  the  seizure/invasion). 

It  is  quite  clear  that  if  p  <  l  —  pc,  then  the  answer  to  the  above  question 
is  yes,  since  then  the  inactive  synapses  by  themselves  percolate.  The  answer 
should  continue  to  be  yes  past  1  —  pc  until  a  new  critical  value  kc  is  reached; 
beyond  kc,  communication  from  the  origin  through  undestroyed  synapses  is 
blocked  or  trapped  by  the  destroyed  synapses.  Let  us  call  the  set  of  sites 
that  can  be  communicated  with  from  the  origin  in  this  situation  its  “trap.” 
In  our  simulations  we  investigated  the  value  of  this  trapping  critical  point 
kc,  and  more  significantly  the  issue  of  whether  the  trapping  phase  transition 
is  in  the  same  “universality  class”  as  the  conventional  percolation  transition 
at  pc.  The  work  was  motivated  by  earlier  work  on  the  related  issue  of  “inva¬ 
sion  percolation”  by  Willemsen  and  Wilkinson  and  by  J.  and  L.  Chayes  and 
Newman. 

Our  numerical  results  for  two  dimensions  are  as  follows.  The  value  of 
kc  is  estimated  as  about  0.520,  which  is  indeed  above  1  —  pc  (which  here  is 
exactly  equal  to  0.5).  The  critical  exponents  were  studied  by  estimating  the 
mean  trap  size  for  p  above  kc  and  then  using  finite  size  scaling  procedures. 


10 


The  conclusion  is  that  the  critical  exponents  are  completely  consistent  with 
those  occurring  in  the  ordinary  percolation  phase  transition.  Thus  there 
does  not  appear  to  be  a  new  universality  class  involved  (disagreeing  with 
earlier  claims  of  Willemsen  and  Wilkinson).  L.  KadanofF  has  noted  that  this 
negative  result  may  be  of  more  than  passing  interest,  since  it  was  thought  that 
invasion  percolation,  being  a  dynamical  model,  was  capable  of  generating 
critical  phenomena  not  obtainable  by  static  percolation  models. 

3.2  Markov  fields  on  branching  planes:  the  connec¬ 
tivity  transition  in  a  layered  network 

This  project,  which  is  part  of  graduate  student  C.  Wu’s  thesis  research, 
involves  a  mixture  of  percolation  and  Ising  theoretic  techniques  in  the  anal¬ 
ysis  of  the  connectivity  properties  of  layered  networks  [7].  The  type  of  net¬ 
work  studied  is  a  stack  of  tree-like  graphs  (Bethe  lattices)  which,  in  addition 
to  their  “horizontal”  tree  graph  edges  also  have  “vertical”  nearest  neighbor 
edges  between  the  individual  layers  of  the  stack.  The  edges  may  be  thought 
of  as  potential  synaptic  connections. 

In  the  percolation  version  of  the  model,  only  a  fraction  p  of  the  horizontal 
synapses  and  a  fraction  v  of  the  vertical  synapses  are  active.  G.  Grimmett 
and  Newman  studied  the  global  connectivity  properties  of  the  system  as  these 
two  parameters  vary  and  discovered  a  pair  of  transitions:  first  from  local 
connectivity  to  large  scale  connectivity  that  is  concentrated  within  layers, 
and  then  to  diffuse  large  scale  connectivity  (i.e.,  in  which  the  original  layered 
nature  of  the  model  is  not  mirrored  by  the  connectivity  pathways). 

In  the  continuation  of  this  work  by  Newman  and  Wu,  it  has  been  shown 
that  this  double  transition  persists  for  Ising  (2-state  neuron)  and  Potts 
(multi-state  neuron)  models  on  such  a  layered  network,  as  a  pair  of  coupling 
parameters  are  varied. 

3.3  Ising  models  and  dependent  percolation 

The  relation  between  ferromagnetic  (excitatory)  Ising  systems  and  percola¬ 
tion  models  has  been  very  fruitful,  as  in  the  work  of  [7].  For  systems  with 
both  inhibitory  and  excitatory  synapses,  the  relation  to  percolation  is  more 
complicated.  In  [8],  Newman  reviews  this  relation  and  treats  some  extra 
issues  that  arise  for  multi-state  neurons  (Potts  models). 


3.4  Topics  in  percolation 

This  review  article  article  [9]  covers  recent  work  in  percolation  including  that 
of  [6,  7,  8J. 

4  Computer  Networks  and  Computer  Per¬ 
formance  Analysis 

The  research  supported  by  this  contract  has  included  a  substantial  compo¬ 
nent  on  the  performance  analysis  of  stochastically  modelled  computer  net¬ 
works,  and  of  computer  systems  in  general  [10,  11,  12,  13,  14].  Robert  Maier 
has  investigated  a  number  of  resource  contention  models  both  theoretically 
and  numerically.  Simulations  have  been  performed  on  a  Connection  Machine 
located  at  the  head  offices  of  Thinking  Machines  in  Cambridge.  The  Con¬ 
nection  Machine  was  made  available  under  a  DARPA  grant,  and  accessed  via 
the  national  Internet. 

The  probabilistic  models  investigated  were  all  models  of  complex  systems : 
systems  comprising  many  loosely  cooperating  agents,  the  overall  behavior  of 
which  is  not  easily  predictable.  The  prototypical  example  is  a  network  of 
computers,  which  compete  for  access  to  a  single  communication  channel. 
Another  is  a  system  in  which  agents  compete  for  a  divisible  (rather  than 
discrete)  resource,  such  as  computer  memory.  In  both  cases  competition  can 
get  out  of  hand:  the  resource  may  be  exhausted,  or  be  poorly  apportioned. 
The  mathematical  study  of  such  phenomena  requires  estimates  on  the  first 
passage  times  of  Markov  processes,  and  large  deviation  theory  is  a  major 
tool. 

The  following  subsections  review  the  models  investigated,  and  summarize 
the  papers  completed  and  in  preparation. 

4.1  Network  instabilities 

Random  access  broadcast  channels  are  frequently  employed  in  packet-switched 
data  communication  networks.  They  have  two  defining  characteristics: 

•  A  single  logical  bus  over  which  most  data  moves.  Data  transmissions 
c.re  usually  received  by  all  nodes  on  the  network,  and  overlapping  trans¬ 
missions  can  interfere  with  one  another. 
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•  A  comparative  absence  of  control  signalling,  whether  out-of-band  or 
in-band.  Nodes  transmit  randomly  and  more  or  less  independently, 
according  to  some  protocol. 

Since  the  nodes  are  independent,  in  the  event  of  packet  collisions  they  must 
implement  their  own  retransmission  strategies.  They  must  buffer  blocked  or 
unsuccessfully  transmitted  packets  and  attempt  to  retransmit  them  later. 

Many  protocols  for  controlling  random  access  broadcast  channels  are  un¬ 
stable:  under  conditions  of  heavy  load  the  packet  collision  rate  may  sud¬ 
denly  rise  to  an  unacceptable  level.  But  determining  theoretically  whether 
any  given  random-access  protocol  is  unstable  is  a  vexing  problem.  Such  pro¬ 
tocols  a s  CSMA/CD  [Carrier  Sense  Multiple  Access/Collision  Detect]  have 
been  repeatedly  studied,  both  analytically  and  numerically,  but  the  results 
of  the  studies  have  been  of  limited  practical  applicability.  This  is  in  part  be¬ 
cause  much  work  has  concentrated  on  infinite-user  models:  network  models 
in  which  the  total  rate  of  packet  arrivals  is  finite,  but  the  number  of  network 
nodes  is  infinite. 

Several  authors,  beginning  with  Kleinrock  and  Lam,  have  explored  the 
way  in  which  such  instabilities  appear  in  stochastically  modelled  networks 
as  TV,  the  number  of  nodes,  tends  to  infinity.  Studies  of  large- TV  networks 
have  revealed  why  heavily  loaded  networks  become  unstable:  the  associated 
Markov  chain  exhibits  two  points  of  stable  equilibrium.  One  corresponds 
to  a  desirable  high-throughput,  low-contention  state,  and  the  other  to  an 
undesirable  low-throughput,  high-contention  state.  The  sudden  degradation 
in  network  performance  corresponds  to  a  transition  from  the  former  to  the 
latter. 

Robert  Maier’s  research  [10]  evaluated  the  performance  of  computer  net¬ 
works  in  the  large- TV  limit,  but  a  large- TV  limit  different  from  that  used  in  the 
conventional  infinite- user  model.  As  TV  — »  oo,  the  packet  retransmission  rates 
on  each  individual  node,  as  well  as  the  packet  arrival  rates,  were  taken  pro¬ 
portional  to  TV-1.  This  choice  of  scaling  allowed  the  use  of  Ventcel-Freidlin 
theory,  the  theory  of  the  large  deviations  («.e.,  leading-order  fluctuations)  of 
Markov  chains.  If  the  time  until  performance  degradation  is  denoted  r,  then 
according  to  the  theory  its  expectation  has  leading-order  asymptotics 

E{r}  ~  CNaeNSo,  TV  — ►  oo 
in  which  So  can  in  principle  be  computed  exactly. 
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The  Ventcel-Freidlin  formalism  was  applied  to  two  random  access  sys¬ 
tems:  a  time-slotted  ALOHANET  model,  and  an  unslotted  persistent  CSMA/CD 
model.  In  both  cases  So  was  computed.  The  retransmission  protocol  used  in 
the  latter  model  closely  resembled  that  used  by  Ethernet,  so  the  computation 
of  So  is  relevant  to  the  stability  of  real-world  computer  networks. 


4.2  Local  contention  models 

The  network  models  reviewed  in  the  last  subsection  exhibited  a  compara¬ 
tively  simple  contention  for  resources.  There  was  only  a  single  resource-  the 
network  bus.  All  agents  competed  for  it,  and  competition  was  global. 

Robert  Maier  is  now  investigating,  both  theoretically  and  by  means  of 
simulations,  a  more  realistic  model  of  local  contention  [11].  Consider  a  Eu¬ 
clidean  or  hypercubic  grid  of  processors,  each  equipped  with  its  own  local 
memory.  Suppose  that  each  processor  is  running  its  own  program,  and  that 
there  is  minimal  interprocessor  communication.  Any  given  processor  may, 
however,  from  time  to  time  require  more  memory  than  it  has  available.  It 
may  attempt  to  access,  and  use,  the  memory  resources  of  its  nearest  neigh¬ 
bors. 

This  sort  of  contention  for  resources  is  purely  local,  and  the  resources  are 
as  distributed  as  the  agents.  But  the  simplest  scheme  for  resolving  collisions 
is  the  same  as  in  the  global  case:  ‘colliding’  nodes  (those  that  wish  to  use 
the  same  memory  at  the  same  time)  resolve  the  deadlock  by  backing  off  a 
random  amount  of  time  and  trying  again.  Such  a  random-access  protocol  can 
be  quite  efficient.  But  under  conditions  of  heavy  load,  t.e.,  frequent  requests 
for  adjacent  memory,  it  may  give  rise  to  bistability.  As  in  the  global  case, 
collisions  and  retries  may  escalate  suddenly  to  an  unacceptable  level. 

The  methods  of  Ventcel  and  Freidlin  are  not  appropriate  to  this  problem; 
although  it  can  be  thought  of  as  a  Markov  chain  problem,  the  dimensionality 
of  the  state  space  is  very  large.  In  fact  it  must  be  approached  largely  through 
simulations.  Simulations  are  being  conducted  on  the  Connection  Machine  at 
Thinking  Machines’  offices  in  Cambridge.  The  Connection  Machine  is  being 
accessed  via  the  NSFNET,  and  Sun  and  other  workstations  at  the  University 
of  Arizona  are  being  used  as  front  ends. 
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4.3  Memory  exhaustion 

Another  problem  investigated  by  Robert  Maier  was  that  of  memory  exhaus¬ 
tion  in  a  system  with  only  two  competing  agents  [12].  The  problem  con¬ 
sidered  was  that  of  two  stacks  evolving  in  a  bounded  region.  This  is  one 
of  the  simplest  problems  in  the  area  of  dynamic  data  structures,  but  a  full 
analysis  requires  surprisingly  deep  mathematics.  It  was  originally  proposed 
by  Knuth,  and  was  investigated  further  by  Flajolet  and  Louchard.  Ventcel- 
Freidlin  theory,  it  turns  out,  allows  the  treatment  of  the  case  in  which  the 
mobility  of  the  stacks  is  allowed  to  vary  with  height. 

The  basic  questions  are  as  follows.  Suppose  one  allocates  a  contiguous 
block  of  m  cells  of  memory,  and  allows  two  stacks  to  evolve  (i.e.,  randomly 
grow  and  shrink)  within  it.  The  two  stacks  begin  on  opposite  sides  of  the 
block;  one  grows  upward  and  the  other  downward.  How  long  will  it  be 
before  they  collide?  And  at  the  time  of  collision,  what  will  their  sizes  be?  It 
is  asymptotic  estimates,  valid  as  m  — ►  oo,  that  are  desired. 

The  time  to  collision  and  the  final  stack  sizes  are  of  course  random  vari¬ 
ables.  Their  distributions  will  depend  on  the  probabilities  assigned  to  the 
possible  evolutionary  histories  of  the  stack  system.  Knuth  suggested  that  at 
each  time  step,  there  should  be  probability  p  of  each  stack  increasing  by  one 
in  size,  and  probability  1/2  —  p  of  each  stack’s  height  decreasing.  This  choice 
defines  a  Markov  chain  on  the  space  of  states  of  the  two-stack  system,  which 
is  parametrized  by  the  two  stack  heights. 

The  initial  state  of  the  system  is  (0,0),  and  the  set  of  final  states  F  is 

{O\*0  €  Qm  |  j  +  k  =  m}. 

In  terms  of  the  Markov  chain,  the  basic  questions  are 

•  How  many  operations  take  place  before  a  state  in  F  is  reached? 

•  Which  state  will  it  be? 

The  answers  are  probabilistic,  and  depend  markedly  on  p. 

The  p  <  1/4  case  is  the  most  realistic,  and  the  hardest  to  analyse.  With 
this  choice  of  p  the  stacks  will  be  biased  toward  contraction,  and  the  mean 
time  to  collision  will  grow  exponentially  rather  than  polynomially  in  m.  Fla¬ 
jolet  made  an  interesting  discovery:  in  this  case,  the  limiting  distribution 
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on  F  is  uniform.  That  is  to  say,  if  m  is  large  and  the  stacks  are  biased  to¬ 
ward  contraction,  when  the  stacks  finally  collide  they  are  as  likely  to  be  large 
as  small.  This  has  obvious  implications  for  the  design  of  algorithms.  In  par¬ 
ticular  it  would  seem  to  indicate  that  the  shared  storage  method  considered 
here,  with  the  stacks  sharing  a  block  of  size  m,  is  usually  far  more  efficient 
than  a  separate  arrangement,  with  the  stacks  confined  to  their  own  regions 
of  size  m/2.  The  latter  scheme  would  run  out  of  memory  much  earlier. 

Robert  Maier  showed  that  the  uniform  distribution  over  F  is  an  artifact , 
attributable  to  Knuth’s  choice  of  random  process.  Traditionally  the  behavior 
of  the  two  stacks  is  assumed  to  be  independent  of  their  size:  p ,  the  probability 
of  an  insertion  into  either  stack,  is  independent  of  state.  It  is  reasonable  to 
allow  more  general  behavior:  the  probability  of  insertions  into  and  deletions 
from  each  stack  could  depend  on  the  height  of  the  stack  as  a  fraction  of  m. 
In  particular  one  can  let  the  insertion  probability  be  (1  —  g(x))/ 4,  and  the 
deletion  probability  be  (1  +  g{x))/ 4,  in  which  x  is  the  height  of  the  stack, 
as  a  fraction  of  available  memory,  and  g(-)  >  0  is  some  sufficiently  smooth 
function  defined  on  [0, 1].  <7(2)  measures  the  extent  to  which  deletions  from 
a  stack  predominate  over  insertions. 

If  g  is  not  constant,  the  limiting  distribution  on  F  will  not  be  uniform. 
Two  possibilities  deserve  mention:  either  the  limiting  distribution  will  be 
localised  at  (m/2,  m/2),  or  it  will  be  concentrated  at  (0,  m)  and  (m,0).  The 
former  occurs  if  g'  >  0,  in  which  case  the  stacks  become  more  biased  toward 
contracting  as  they  grow.  It  also  occurs  if  y(0)  =  0,  in  which  case  the 
bias  toward  contraction  disappears  at  low  stack  height.  The  latter  typically 
obtains  if  g'  <  0. 

Besides  the  limiting  distribution  on  F,  the  distribution  of  r,  the  number 
of  operations  that  take  place  before  F  is  reached,  was  studied.  In  the  case 
of  constant  p  <  1/4,  which  corresponds  to  constant  y(’),  Flajolet  proved 
combinatorially  that  its  distribution  is  asymptotic  to  that  of  an  exponential 
random  variable,  with  mean  roughly  0(((\  -  p)/p)m)-  It  was  shown  that 
this  phenomenon  occurs  very  generally:  for  any  sufficiently  smooth  <?(•)  >  0, 
there  are  constants  Co,  a  and  So  such  that  r/Com°emS°  is  asymptotically 
exponential  with  unit  mean.  Explicit  formulae  for  Co,  a  and  So  in  terms 
of  g(-)  were  derived. 
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4.4  Weighted  evolutions  of  data  structures 

Robert  Maier’s  research  included  some  other  work  on  stochastically  modelled 
data  structures.  The  emphasis  was  on  the  asymptotics  of  the  cost  of  a  se¬ 
quence  of  operations  on  a  data  structure,  rather  than  on  the  appearance  of 
instabilities  or  other  undesirable  behavior.  The  same  Ventcel-Freidlin  theory 
proved  useful,  however. 

A  number  of  authors  had  previously  derived  asymptotic  expressions  for 
the  average  cost  of  n  operations  on  such  data  structures  as  priority  queues 
and  dictionaries.  The  expressions  depended  on  (1)  the  implementation  of  the 
data  structure,  and  (2)  the  probability  measure  over  random  sequences  of  op¬ 
erations  (insertion,  deletion,  and  queries  of  various  sorts)  used  in  computing 
the  average. 

In  the  present  investigation  [13]  an  equiprobability  of  histories  was  as¬ 
sumed:  all  possible  sequences  of  alterations  of  the  data  structure  were  taken 
as  equiprobable.  This  included  alterations  consequent  on  the  insertion  of  a 
datum,  on  the  deletion  of  a  datum,  and  on  accessing  the  structure  to  query 
it  or  to  alter  a  datum  in  some  way  without  removing  it.  The  cases  of  list  and 
d-heap  implementations  of  priority  queues  were  treated,  and  the  case  of  list 
implementations  of  linear  lists  and  dictionaries.  In  the  case  of  dictionaries, 
an  arbitrary  number  of  query  types  were  allowed. 

In  this  framework,  results  on  list  implementations  had  been  obtained  pre¬ 
viously  by  the  combinatoric  techniques  of  Flajolet  and  the  more  probabilistic 
method  of  Louchard.  The  present  treatment  extended  theirs  by  covering  heap 
implementations  as  well  as  lists.  Much  more  importantly,  it  brought  to  bear 
the  powerful  formalism  of  path  integration.  This  formalism  originated  in 
physics,  and  can  be  viewed  as  a  user-friendly  form  of  Ventcel-Freidlin  theory. 

The  path  integral  formalism  makes  possible  a  very  general  analysis  of 
equiprobable  structure  histories:  so  long  as  the  structure  implementations  are 
of  the  simple  list  or  heap  form,  the  allowed  operations  can  be  considerably 
more  sophisticated  than  mere  insertions  or  deletions.  The  assumption  of 
strict  equiprobability  of  histories  is  also  unnecessary:  one  can  easily  treat 
models  in  which  histories  are  differently  weighted,  with  certain  operations 
taken  as  more  likely  than  others.  Such  models  are  particularly  difficult  to 
handle  combinatorially. 

The  applicability  of  the  path  integral  method  is  not  so  much  restricted 
by  the  choice  of  datatype  as  by  the  choice  of  implementation.  In  general 
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this  method,  with  its  assumption  of  a  comparatively  small  state  space,  can 
be  used  to  treat  implementations  that  “uniquely  represent  their  data.”  Such 
implicit  implementations  allow  at  most  one  representation,  up  to  order  iso¬ 
morphism,  for  any  quantity  of  internally  stored  data. 

It  was  shown  that  for  such  implementations  the  asymptotic  determinism 
found  by  Louchard  occurs  very  generally:  in  the  limit  of  long  histories,  the 
most  likely  evolutions  of  the  data  structure  are  those  that  cluster  tightly 
around  a  deterministic  path.  In  consequence  the  integrated  space  and  time 
costs,  when  normalized,  converge  as  n  — ►  oo  to  deterministic  values.  In  the 
case  of  list  implementations  the  limiting  costs  are  quadratic  in  n,  so  that 
expected  costs  increase  in  the  limit  as  fast  as  worst-case  costs.  But  for  heaps 
the  expected  costs  turn  out  to  increase  rather  less  rapidly:  the  integrated 
space  cost  as  n2/ x/log  n,  and  the  integrated  time  as  nlogn.  In  expectation 
the  spatial  cost  differs  markedly  from  its  worst-case  value,  which  is  quadratic 
in  n. 

This  unusual  phenomenon  —  the  clustering  of  integrated  costs  around 
deterministic  values  —  probably  does  not  occur  in  real-world  databases.  Its 
failure  to  appear  is  a  sign  that  in  the  real  world,  data  structure  histories  are 
far  from  being  equiprobable. 

4.5  Stochastic  orderings  and  mean  extremes 

An  additional  topic  investigated  by  Robert  Maier  is  more  theoretical,  but 
has  applications  to  queueing  theory  and  computer  performance  analysis.  In 
collaboration  with  Peter  Downey  of  the  University  of  Arizona,  he  has  investi¬ 
gated  the  relationship  between  stochastic  orderings  of  random  variables  and 
the  growth  rate  of  their  mean  extremes[14]. 

Consider  n  independent  copies  of  a  non-negative  random  variable  X.  De¬ 
note  by  X(„)  the  maximum  of  these  copies.  It  is  a  random  variable  itself,  and 
is  called  the  maximum  order  statistic ,  or  the  extreme  of  the  n-element  sam¬ 
ple.  If  X  is  unbounded  above,  then  as  n  — ♦  oo  the  mean  extreme  E{X(n)} 
will  also  tend  to  infinity.  Its  growth  rate  is  of  interest. 

If  X  is  an  exponential  random  variable,  it  is  easy  to  check  that  E{A(n)} 
grows  logarithmically  in  n.  Maier  and  Downey  showed  that  it  grows  no  faster 
than  logarithmically  if  and  only  if  X  is  suitably  bounded  by  an  exponential: 
in  particular,  if  and  only  if  it  is  stochastically  dominated  by  an  exponential 
variate. 
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This  result  is  really  only  a  special  case  of  a  new  result  on  stochastic 
orderings.  One  of  these  orderings  is  variability  order.  Variability  ordering 
is  a  partial  order  on  non- negative  random  variables;  one  says  that  X<CY , 
or  that  X  is  less  variable  than  Y ,  if  E{/i(A”)}  <  E{/i(y)}  for  all  increasing 
convex  functions  h:  R+  — ►  R+.  Another  ordering  on  non-negative  random 
variables  is  the  mean  extreme  ordering ,  defined  as  follows:  X<0Y  means 
that  the  mean  extremes  of  X  are  all  less  than  or  equal  to  the  corresponding 
mean  extremes  of  Y. 

Maier  and  Downey  showed  that  the  orderings  <c  and  <Q  are  very  closely 
related:  X<CY  implies  X<aY,  and  X<0Y  implies  X<cCY  for  some  univer¬ 
sal  constant  C.  So  X  is  bounded  in  variability  ordering  by  Y  if  and  only  if 
its  mean  extremes  grow  no  more  rapidly,  as  n  — ♦  oo,  than  those  of  Y.  But  it 
is  easy  to  show  that  X  is  bounded  in  variability  ordering  by  an  exponential 
variate  Y  if  and  only  if  X  is  stochastically  dominated  by  Y.  So  the  above 
result  follows. 

4.6  Phase-type  distributions 

This  work  [15]  contains  an  algorithm  that  constructs,  from  a  given  ratio¬ 
nal  function,  a  Markov  chain  whose  absorption-time  distribution  has  the 
rational  function  as  generating  function.  This  provides  an  algebraic  proof 
of  C.  O’Cinneide’s  recent  characterization  of  discrete  phase-type  distribu¬ 
tions.  The  algorithm  is  based  on  an  automata-theoretic  algorithm  due  to  M. 
Soittola. 

Moreover  the  characterization  of  continuous  phase-type  distributions  fol¬ 
lows  from  the  discrete  characterization.  In  conjunction  with  the  discrete-time 
algorithm,  this  engenders  an  algorithm  for  constructing  a  Markov  process 
representation  for  any  distribution  of  continuous  phase-type.  This  work  suc¬ 
ceeded  in  tying  together  the  theory  of  Markov  chains  with  absorption  and 
the  theory  of  finite  automata. 
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